{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "iris_xgboost.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.8"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IiTxCwB7t6Ef",
        "colab_type": "text"
      },
      "source": [
        "# Training an Iris classifier using XGBoost\n",
        "\n",
        "_WARNING: you are on the master branch, please refer to the examples on the branch that matches your `cortex version`_\n",
        "\n",
        "In this notebook, we'll show how to train a classifier trained on the [iris data set](https://archive.ics.uci.edu/ml/datasets/iris) using XGBoost."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "j6QdLAUpuW7r",
        "colab_type": "text"
      },
      "source": [
        "## Install Dependencies\n",
        "First, we'll install our dependencies:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BQE5z_kHj9jV",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "pip install xgboost==0.90 scikit-learn==0.21.* onnxmltools==1.5.* boto3==1.*"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yEVK-sLnumqn",
        "colab_type": "text"
      },
      "source": [
        "## Load the data\n",
        "We can use scikit-learn to load the Iris dataset:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "tx9Xw0x0lfbl",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from sklearn.datasets import load_iris\n",
        "from sklearn.model_selection import train_test_split\n",
        "\n",
        "iris = load_iris()\n",
        "X, y = iris.data, iris.target\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=42)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "obGdgMm3urb2",
        "colab_type": "text"
      },
      "source": [
        "## Train the model\n",
        "We'll use XGBoost's [`XGBClassifier`](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier) to train the model:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jjYp8TaflhW0",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import xgboost as xgb\n",
        "\n",
        "xgb_model = xgb.XGBClassifier()\n",
        "xgb_model = xgb_model.fit(X_train, y_train)\n",
        "\n",
        "print(\"Test data accuracy of the xgb classifier is {:.2f}\".format(xgb_model.score(X_test, y_test)))  # Accuracy should be > 90%"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Hdwu-wzJvJLb",
        "colab_type": "text"
      },
      "source": [
        "## Export the model\n",
        "Now we can export the model in the ONNX format:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "AVgs2mkdllRn",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from onnxmltools.convert import convert_xgboost\n",
        "from onnxconverter_common.data_types import FloatTensorType\n",
        "\n",
        "onnx_model = convert_xgboost(xgb_model, initial_types=[(\"input\", FloatTensorType([1, 4]))])\n",
        "\n",
        "with open(\"xgboost.onnx\", \"wb\") as f:\n",
        "    f.write(onnx_model.SerializeToString())"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ipVlP4yPxFxw",
        "colab_type": "text"
      },
      "source": [
        "## Upload the model to AWS\n",
        "\n",
        "Cortex loads models from AWS, so we need to upload the exported model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3IqsfyylxLhy",
        "colab_type": "text"
      },
      "source": [
        "Set these variables to configure your AWS credentials and model upload path:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lc9LBH1uHT_h",
        "colab_type": "code",
        "cellView": "form",
        "colab": {}
      },
      "source": [
        "AWS_ACCESS_KEY_ID = \"\" #@param {type:\"string\"}\n",
        "AWS_SECRET_ACCESS_KEY = \"\" #@param {type:\"string\"}\n",
        "S3_UPLOAD_PATH = \"s3://my-bucket/iris-classifier/xgboost.onnx\" #@param {type:\"string\"}\n",
        "\n",
        "import sys\n",
        "import re\n",
        "\n",
        "if AWS_ACCESS_KEY_ID == \"\":\n",
        "    print(\"\\033[91m{}\\033[00m\".format(\"ERROR: Please set AWS_ACCESS_KEY_ID\"), file=sys.stderr)\n",
        "\n",
        "elif AWS_SECRET_ACCESS_KEY == \"\":\n",
        "    print(\"\\033[91m{}\\033[00m\".format(\"ERROR: Please set AWS_SECRET_ACCESS_KEY\"), file=sys.stderr)\n",
        "\n",
        "else:\n",
        "    try:\n",
        "        bucket, key = re.match(\"s3://(.+?)/(.+)\", S3_UPLOAD_PATH).groups()\n",
        "    except:\n",
        "        print(\"\\033[91m{}\\033[00m\".format(\"ERROR: Invalid s3 path (should be of the form s3://my-bucket/path/to/file)\"), file=sys.stderr)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NXeuZsaQxUc8",
        "colab_type": "text"
      },
      "source": [
        "Upload the model to S3:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YLmnWTEVsu55",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import boto3\n",
        "\n",
        "s3 = boto3.client(\"s3\", aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)\n",
        "print(\"Uploading {} ...\".format(S3_UPLOAD_PATH), end = '')\n",
        "s3.upload_file(\"xgboost.onnx\", bucket, key)\n",
        "print(\" ✓\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aR-mmcUzyCV3",
        "colab_type": "text"
      },
      "source": [
        "<!-- CORTEX_VERSION_MINOR -->\n",
        "That's it! See the [example](https://github.com/cortexlabs/cortex/tree/master/examples/xgboost/iris-classifier) for how to deploy the model as an API."
      ]
    }
  ]
}
